Skip to content

Conversation

bitoku
Copy link
Contributor

@bitoku bitoku commented Sep 17, 2025

Because we removed containernetworking-plugins, we can't run containers without ovn-kubernetes.

We removed from this commit https://pkgs.devel.redhat.com/cgit/rpms/cri-o/commit/?h=rhaos-4.21-rhel-9&id=06797c8b73993ca1e930eec1870098dec3fde8ef

context: https://redhat-internal.slack.com/archives/C999USB0D/p1749135474527779

Copy link

openshift-ci bot commented Sep 17, 2025

Hi @bitoku. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes several CRI-O container tests that can no longer be run due to the removal of containernetworking-plugins. The changes are mostly deletions of test code. While the core changes are correct, there are a couple of opportunities for further cleanup to improve code maintainability, which I've detailed in the specific comments.

// qemu machines cannot communicate between each other
ExcludePlatforms: []string{"qemu"},
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for the crio.base test on line 183 is now outdated. Since the networking tests have been removed, the description should be updated to no longer claim it verifies 'reliable networking'.

I suggest changing line 183 to:

Description: "Verify cri-o basic funcions work, include storage driver is overlay, storage root is /varlib/containers/storage, and cgroup driver is systemd",

Comment on lines 193 to 403
func crioBaseTests(c cluster.TestCluster) {
c.Run("crio-info", testCrioInfo)
c.Run("pod-continues-during-service-restart", crioPodContinuesDuringServiceRestart)
c.Run("networks-reliably", crioNetworksReliably)
}

// generateCrioConfig generates a crio pod/container configuration
// based on the input name and arguments returning the path to the generated configs.
func generateCrioConfig(podName, imageName string, command []string) (string, string, error) {
fileContentsPod := fmt.Sprintf(crioPodTemplate, podName, uuid.New().String(), imageName)

tmpFilePod, err := os.CreateTemp("", podName+"Pod")
if err != nil {
return "", "", err
}
defer tmpFilePod.Close()
if _, err = tmpFilePod.Write([]byte(fileContentsPod)); err != nil {
return "", "", err
}
cmd := strings.Join(command, " ")
fileContentsContainer := fmt.Sprintf(crioContainerTemplate, imageName, imageName, cmd)

tmpFileContainer, err := os.CreateTemp("", imageName+"Container")
if err != nil {
return "", "", err
}
defer tmpFileContainer.Close()
if _, err = tmpFileContainer.Write([]byte(fileContentsContainer)); err != nil {
return "", "", err
}

return tmpFilePod.Name(), tmpFileContainer.Name(), nil
}

// genContainer makes a container out of binaries on the host. This function uses podman to build.
// The first string returned by this function is the pod config to be used with crictl runp. The second
// string returned is the container config to be used with crictl create/exec. They will be dropped
// on to all machines in the cluster as ~/$STRING_RETURNED_FROM_FUNCTION. Note that the string returned
// here is just the name, not the full path on the cluster machine(s).
func genContainer(c cluster.TestCluster, m platform.Machine, podName, imageName string, binnames []string, shellCommands []string) (string, string, error) {
configPathPod, configPathContainer, err := generateCrioConfig(podName, imageName, shellCommands)
if err != nil {
return "", "", err
}
if err = cluster.DropFile(c.Machines(), configPathPod); err != nil {
return "", "", err
}
if err = cluster.DropFile(c.Machines(), configPathContainer); err != nil {
return "", "", err
}
// Create the crio image used for testing, only if it doesn't exist already
output := c.MustSSH(m, "sudo podman images -n --format '{{.Repository}}'")
if !strings.Contains(string(output), "localhost/"+imageName) {
util.GenPodmanScratchContainer(c, m, imageName, binnames)
}

return path.Base(configPathPod), path.Base(configPathContainer), nil
}

// crioNetwork ensures that crio containers can make network connections outside of the host
func crioNetwork(c cluster.TestCluster) {
machines := c.Machines()
src, dest := machines[0], machines[1]

c.Log("creating ncat containers")

// Since genContainer also generates crio pod/container configs,
// there will be a duplicate config file on each machine.
// Thus we only save one set for later use.
crioConfigPod, crioConfigContainer, err := genContainer(c, src, "ncat", "ncat", []string{"ncat", "echo"}, []string{"ncat"})
if err != nil {
c.Fatal(err)
}
_, _, err = genContainer(c, dest, "ncat", "ncat", []string{"ncat", "echo"}, []string{"ncat"})
if err != nil {
c.Fatal(err)
}

listener := func(ctx context.Context) error {
podID, err := c.SSHf(dest, "sudo crictl runp -T %s %s", overrideCrioOperationTimeoutSeconds, crioConfigPod)
if err != nil {
return err
}

containerID, err := c.SSHf(dest, "sudo crictl create -T %s --no-pull %s %s %s",
overrideCrioOperationTimeoutSeconds,
podID, crioConfigContainer, crioConfigPod)
if err != nil {
return err
}

// This command will block until a message is recieved
output, err := c.SSHf(dest, "sudo timeout 30 crictl exec %s echo 'HELLO FROM SERVER' | timeout 20 ncat --listen 0.0.0.0 9988 || echo 'LISTENER TIMEOUT'", containerID)
if err != nil {
return err
}
if string(output) != "HELLO FROM CLIENT" {
return fmt.Errorf("unexpected result from listener: %s", output)
}

return nil
}

talker := func(ctx context.Context) error {
// Wait until listener is ready before trying anything
for {
_, err := c.SSH(dest, "sudo ss -tulpn|grep 9988")
if err == nil {
break // socket is ready
}

exit, ok := err.(*ssh.ExitError)
if !ok || exit.Waitmsg.ExitStatus() != 1 { // 1 is the expected exit of grep -q
return err
}

select {
case <-ctx.Done():
return fmt.Errorf("timeout waiting for server")
default:
time.Sleep(100 * time.Millisecond)
}
}
podID, err := c.SSHf(src, "sudo crictl runp -T %s %s", overrideCrioOperationTimeoutSeconds, crioConfigPod)
if err != nil {
return err
}

containerID, err := c.SSHf(src, "sudo crictl create -T %s --no-pull %s %s %s",
overrideCrioOperationTimeoutSeconds,
podID, crioConfigContainer, crioConfigPod)
if err != nil {
return err
}

output, err := c.SSHf(src, "sudo crictl exec %s echo 'HELLO FROM CLIENT' | ncat %s 9988",
containerID, dest.PrivateIP())
if err != nil {
return err
}
if string(output) != "HELLO FROM SERVER" {
return fmt.Errorf(`unexpected result from listener: "%s"`, output)
}

return nil
}

ctx, cancel := context.WithTimeout(context.Background(), time.Minute)
defer cancel()

if err := worker.Parallel(ctx, listener, talker); err != nil {
c.Fatal(err)
}
}

// crioNetworksReliably verifies that crio containers have a reliable network
func crioNetworksReliably(c cluster.TestCluster) {
m := c.Machines()[0]
hostIP := "127.0.0.1"

// Here we generate 10 pods, each will run a container responsible for
// pinging to host
output := ""
for x := 1; x <= 10; x++ {
// append int to name to avoid pod name collision
crioConfigPod, crioConfigContainer, err := genContainer(
c, m, fmt.Sprintf("ping%d", x), "ping", []string{"ping"},
[]string{"ping"})
if err != nil {
c.Fatal(err)
}

cmdCreatePod := fmt.Sprintf("sudo crictl runp -T %s %s", overrideCrioOperationTimeoutSeconds, crioConfigPod)
podID := c.MustSSH(m, cmdCreatePod)
containerID := c.MustSSH(m, fmt.Sprintf("sudo crictl create -T %s --no-pull %s %s %s",
overrideCrioOperationTimeoutSeconds,
podID, crioConfigContainer, crioConfigPod))
output = output + string(c.MustSSH(m, fmt.Sprintf("sudo crictl exec %s ping -i 0.2 %s -w 1 >/dev/null && echo PASS || echo FAIL", containerID, hostIP)))
}

numPass := strings.Count(string(output), "PASS")
if numPass != 10 {
c.Fatalf("Expected 10 passes, but received %d passes with output: %s", numPass, output)
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With the removal of several tests in this file, the following package-level variables are now unused and should be removed for better code hygiene:

  • overrideCrioOperationTimeoutSeconds
  • crioPodTemplate
  • crioContainerTemplate

@travier
Copy link
Member

travier commented Sep 17, 2025

Can you be more precise and mention in which versions we removed that?

@haircommander
Copy link

I wonder if we could package the bridge plugin in this repo and install it just to run these smoke tests. or even pull with podman container. I don't love losing the smoke tests

@haircommander
Copy link

that could be a follow up if we're blocked on these failures though

@bitoku
Copy link
Contributor Author

bitoku commented Sep 18, 2025

If this is a critical blocker, we could revert the changes in cri-o RPM and defer it to 4.22 whose base image will be rhcos 10.

@jlebon
Copy link
Member

jlebon commented Sep 18, 2025

I wonder if we could package the bridge plugin in this repo and install it just to run these smoke tests. or even pull with podman container. I don't love losing the smoke tests

I think my preference would be to modify the tests to exercise paths closer to what would be done in OCP. E.g. can we set up a minimal ovn-kubernetes config in these tests?

@haircommander
Copy link

ovn-k needs an apiserver running afaiu, in these tests cri-o is standalone right?

@jlebon
Copy link
Member

jlebon commented Sep 18, 2025

ovn-k needs an apiserver running afaiu, in these tests cri-o is standalone right?

Indeed.

or even pull with podman container

  1. Is that available and maintained somewhere?
  2. How much overlap does testing it that way have with how cri-o is actually used in OCP? I suppose from the point of view of cri-o, it's transparent? Also worried about us then having to debug issues more related with that bespoke test setup than actually uncovering cri-o bugs.

@haircommander
Copy link

we also could just test host network pods, which don't need a networking plugin started to run. We functionally would be testing that from the perspecitve of openshift (the networking plugin we're using isnt' running, so we're not getting the coverage from these tests from network plugin specific things). That way, we retain the smoke test aspect of it. we'd still be testing a narrow scope of cri-o features, but most of this is to make sure cri-o isn't completely borked before pulling into rhcos anyway

@haircommander
Copy link

so this test would be less relevant https://github.com/coreos/coreos-assembler/pull/4325/files#diff-e4d06499126559f6093ccf8c73c956a201386983c47a439542eaf42773dd132aL220 but we'd still get something from it I think

Because we removed containernetworking-plugins, we can't run containers with CNI
@bitoku
Copy link
Contributor Author

bitoku commented Sep 22, 2025

I changed to use hostNetwork in the tests.
@travier @jlebon Can any of you run this modified test with the latest crio rpm?

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me!

This test is currently failing in the pipeline, so let's just get this in and try it out there.

@jlebon jlebon merged commit b612599 into coreos:main Sep 22, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants